AITopics | task weight

Collaborating Authors

task weight

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Liu, Meitong, Jung, Christopher, Li, Rui, Feng, Xue, Zhao, Han

arXiv.org Machine LearningMar-31-2026

In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.

artificial intelligence, equation, machine learning, (16 more...)

arXiv.org Machine Learning

2603.28739

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Table 1 Starting from one auxiliary task Exemplar MT we keep

Neural Information Processing SystemsOct-2-2025, 21:58:13 GMT

We would like to thank all the reviewers for writing the insightful comments, especially during this difficult time. However, lowering training loss may cause overfitting, especially when training data is scarce. The superiority of ARML is verified in the experiments. The error rates decreases when each new task is added. In'Baseline + ARML ', for fair comparison, we stick to the same training process, We will add more elaboration on this in the final version. We will try other tasks, e.g., reinforcement learning.

artificial intelligence, auxiliary task, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

580c4ec4738ff61d5862a122cdf139b6-Paper-Conference.pdf

Neural Information Processing SystemsAug-22-2025, 00:22:58 GMT

algorithm, cross-entropy loss, mto algorithm, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Mountain View (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Finding the most relevant auxiliary forecasting tasks for pre-training and knowledge transferring to a given primary

Neural Information Processing SystemsAug-15-2025, 17:40:18 GMT

We thank the reviewers for valuable and timely comments. We'd like to first emphasize the challenges and contributions: Section 3.2 explains how to calculate this hyper-gradient of Framework for BackPropagation, LeCun, 1988), and widely adopted in the literature [14, 15, 35]. We would like to further polish the notation to be more consistent. 'Pretrain (Top)' is much better than'Pretrain (Down)'.

pre-training and knowledge, relevant auxiliary forecasting task, target task, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.32)

Add feedback

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

Corrado, Nicholas E., Katz-Samuels, Julian, Devraj, Adithya, Yun, Hyokun, Zhang, Chao, Xu, Yi, Pan, Yi, Yin, Bing, Chilimbi, Trishul

arXiv.org Artificial IntelligenceJun-3-2025

When aligning large language models (LLMs), their performance on various tasks (such as being helpful, harmless, and honest) depends heavily on the composition of their training data. However, selecting a data mixture that achieves strong performance across all tasks is challenging. Existing approaches rely on large ablation studies, heuristics, or human intuition, but these can be prohibitively expensive and suboptimal. We study this problem in the setting of preference optimization via DPO and introduce AutoMixAlign (AMA), a theoretically-grounded algorithm that adaptively mixes datasets during training to balance performance across tasks. AMA first trains \textit{specialist models} for each task to determine losses that correspond to strong task performance. Then, it trains a generalist model using a novel minimax optimization that prioritizes tasks for which generalist model losses deviate most from specialist model losses. To optimize this problem, we propose two algorithms: (1) AMA-R, which adaptively reweights the objective to prioritize tasks, and (2) AMA-S, which adaptively adjusts how much data is sampled from each task to prioritize tasks. Both algorithms achieve a convergence rate of $O(1/\sqrt{T})$ in the convex case. AMA-R's convergence result follows from Sagawa et al. (2019), and we provide a convergence proof for AMA-S using online learning techniques such as EXP3. We evaluate AMA on several multitask alignment setups and find that AMA outperforms the standard alignment approach -- which simply optimizes the total loss across all tasks -- and also outperforms model merging methods.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.00569

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Fan, Simin, Glarou, Maria Ios, Jaggi, Martin

arXiv.org Artificial IntelligenceMay-28-2025

The performance of large language models (LLMs) across diverse downstream applications is fundamentally governed by the quality and composition of their pretraining corpora. Existing domain reweighting algorithms primarily optimize data mixtures for a single target task, thereby resulting in models that overfit to specialized objectives while exhibiting substantial performance degradation on other benchmarks. This paper introduces Group Robust Multi-target Adaptive PrEtraining (GRAPE), a novel multi-source-multi-target domain reweighting framework designed to calibrate pretraining data mixtures for robust performance across multiple target tasks simultaneously. GRAPE dynamically adjusts sampling weights across source domains (domain weights) while concurrently modulating task weights that quantify the relative importance of each individual target task. This adaptive process prioritizes tasks based on their learning difficulty throughout training. We formulate this interleaved reweighting mechanism as a minimax optimization problem: The inner maximization adjusts task weights leveraging group distributed-robust-optimization (DRO), where those tasks demonstrating the least improvement under the current data mixture are prioritized with higher weights; The outer minimization then optimizes domain weights to maximize loss reduction on the prioritized tasks. Experiments on ClimbLab and SlimPajama datasets demonstrate that GRAPE consistently outperforms baseline methods in terms of reasoning performance across 6 benchmarks. Furthermore, when applied to multilingual targets, GRAPE effectively identifies optimal training mixtures from mainstream languages, achieving superior language modeling capabilities across 8 low-resource target languages.

domain weight, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.2038

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment (0.92)
Education > Curriculum (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning

Lai, Wenna, Xie, Haoran, Xu, Guandong, Li, Qing

arXiv.org Artificial IntelligenceDec-12-2024

Implicit sentiment analysis (ISA) presents significant challenges due to the absence of salient cue words. Previous methods have struggled with insufficient data and limited reasoning capabilities to infer underlying opinions. Integrating multi-task learning (MTL) with large language models (LLMs) offers the potential to enable models of varying sizes to reliably perceive and recognize genuine opinions in ISA. However, existing MTL approaches are constrained by two sources of uncertainty: data-level uncertainty, arising from hallucination problems in LLM-generated contextual information, and task-level uncertainty, stemming from the varying capacities of models to process contextual information. To handle these uncertainties, we introduce MT-ISA, a novel MTL framework that enhances ISA by leveraging the generation and reasoning capabilities of LLMs through automatic MTL. Specifically, MT-ISA constructs auxiliary tasks using generative LLMs to supplement sentiment elements and incorporates automatic MTL to fully exploit auxiliary data. We introduce data-level and task-level automatic weight learning (AWL), which dynamically identifies relationships and prioritizes more reliable data and critical tasks, enabling models of varying sizes to adaptively learn fine-grained weights based on their reasoning capabilities. We investigate three strategies for data-level AWL, while also introducing homoscedastic uncertainty for task-level AWL. Extensive experiments reveal that models of varying sizes achieve an optimal balance between primary prediction and auxiliary tasks in MT-ISA. This underscores the effectiveness and adaptability of our approach.

large language model, machine learning, mt-isa, (18 more...)

arXiv.org Artificial Intelligence

2412.09046

Country:

North America > United States > California (0.14)
Asia > China > Hong Kong (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System

He, Yun, Chen, Xuxing, Xu, Jiayi, Cai, Renqin, You, Yiling, Cao, Jennifer, Huang, Minhui, Yang, Liu, Liu, Yiqun, Liu, Xiaoyi, Jin, Rong, Park, Sem, Long, Bo, Feng, Xue

arXiv.org Artificial IntelligenceNov-3-2024

In industrial recommendation systems, multi-task learning (learning multiple tasks simultaneously on a single model) is a predominant approach to save training/serving resources and improve recommendation performance via knowledge transfer between the joint learning tasks. However, multi-task learning often suffers from negative transfer: one or several tasks are less optimized than training them separately. To carefully balance the optimization, we propose a gradient balancing approach called MultiBalance, which is suitable for industrial-scale multi-task recommendation systems. It balances the per-task gradients to alleviate the negative transfer, while saving the huge cost for grid search or manual explorations for appropriate task weights. Moreover, compared with prior work that normally balance the per-task gradients of shared parameters, MultiBalance is more efficient since only requiring to access per-task gradients with respect to the shared feature representations. We conduct experiments on Meta's large-scale ads and feeds multi-task recommendation system, and observe that MultiBalance achieves significant gains (e.g., 0.738% improvement for normalized entropy (NE)) with neutral training cost in Queries Per Second (QPS), which is significantly more efficient than prior methods that balance per-task gradients of shared parameters with 70~80% QPS degradation.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2411.11871

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

task weight

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

abc99d6b9938aa86d1f30f8ee0fd169f-AuthorFeedback.pdf

580c4ec4738ff61d5862a122cdf139b6-Paper-Conference.pdf

Table 1 Starting from one auxiliary task Exemplar MT we keep

580c4ec4738ff61d5862a122cdf139b6-Paper-Conference.pdf

Finding the most relevant auxiliary forecasting tasks for pre-training and knowledge transferring to a given primary

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning

MultiBalance: Multi-Objective Gradient Balancing in Industrial-Scale Multi-Task Recommendation System